A dictionary- and rule-based system for identification of bacteria and habitats in text
نویسندگان
چکیده
The number of scientific papers published each year is growing exponentially and given the rate of this growth, automated information extraction is needed to efficiently extract information from this corpus. A critical first step in this process is to accurately recognize the names of entities in text. Previous efforts, such as SPECIES, have identified bacteria strain names, among other taxonomic groups, but have been limited to those names present in NCBI taxonomy. We have implemented a dictionary-based named entity tagger, TagIt, that is followed by a rule based expansion system to identify bacteria strain names and habitats and resolve them to the closest match possible in the NCBI taxonomy and the OntoBiotope ontology respectively. The rule based post processing steps expand acronyms, and extend strain names according to a set of rules, which captures additional aliases and strains that are not present in the dictionary. TagIt has the best performance out of three entries to BioNLP-ST BB3 cat+ner, with an overall SER of 0.628 on the independent test set.
منابع مشابه
A High Capacity Email Steganography Scheme using Dictionary
The main objective of steganography is to conceal a secret message within a cover-media in such a way that only the original receiver can discern the presence of the hidden message. The cover-media can be a text, email, audio, image, and video, which can be transmitted through a public channel, such as the Internet. By extending the use of email among Internet users, the provision of email steg...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملClassification of Habitats in Bakhtegan wetland (Fars Province) in Mediterranean Wetland System
Understanding ecosystems and their zonation based on their characteristics and sensitivities is an essential step in protecting natural resources and the sustainability of their habitats. The Mediterranean Wetland Habitat Classification System (MEDWET) is used to identify Bakhtegan wetland, Fars Province, Iran, by hierarchical classification of their habitats, in which they are identified and d...
متن کاملA dictionary to identify small molecules and drugs in free text
MOTIVATION From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database iden...
متن کاملDESIGN AND IMPLEMENTATION OF FUZZY EXPERT SYSTEM FOR REAL ESTATE RECOMMENDATION
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; backgro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016